Skip to content

feat: add parallel distance computation and vectorized pipeline#43

Open
vc1492a wants to merge 10 commits intodevfrom
feature/numba_parallel
Open

feat: add parallel distance computation and vectorized pipeline#43
vc1492a wants to merge 10 commits intodevfrom
feature/numba_parallel

Conversation

@vc1492a
Copy link
Owner

@vc1492a vc1492a commented Sep 17, 2020

Summary

This PR addresses #36 by implementing parallelization and vectorization of PyNomaly's distance computation and pipeline, rebased onto the current v0.3.5 codebase.

Changes

  • Vectorized kNN distances: Replaced the O(n²) Python nested loop with chunked NumPy broadcasting (+ optional scipy.spatial.distance.cdist), yielding significant speedups without new required dependencies
  • n_jobs parameter: Added cross-cluster multiprocessing via concurrent.futures.ProcessPoolExecutor. Set n_jobs=-1 to use all CPU cores. Follows the scikit-learn convention
  • Numba parallel mode: Restructured the Numba path with non-generator kernels using numba.prange for proper thread-level parallelism (the previous generator-based approach was incompatible with Numba's parallel mode)
  • Optional scipy acceleration: Uses scipy.spatial.distance.cdist for distance computation and scipy.special.erf for the error function when scipy is available, with graceful fallback to pure NumPy
  • Vectorized pipeline: Replaced Python for loops in _standard_distances, _prob_distances, and _norm_prob_outlier_factor with vectorized NumPy operations
  • Progress bar preserved: Progress bars work across all execution modes (sequential, parallel, Numba) with chunk-level or cluster-level granularity

API

Fully backward-compatible. The only addition is the optional n_jobs parameter (default 1):

loop.LocalOutlierProbability(data, n_jobs=-1).fit()

All existing function calls, examples, and usage patterns continue to work unchanged.

Testing

  • All 26 existing tests pass unchanged
  • 3 new tests added: test_n_jobs_equivalence, test_n_jobs_single_cluster, test_n_jobs_invalid

Closes #36

@vc1492a vc1492a added enhancement New feature of request in progress This issue is being actively worked on labels Sep 17, 2020
@vc1492a vc1492a self-assigned this Sep 17, 2020
@coveralls
Copy link

coveralls commented Sep 17, 2020

Pull Request Test Coverage Report for Build 142

  • 32 of 44 (72.73%) changed or added relevant lines in 1 file are covered.
  • 11 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-6.2%) to 93.188%

Changes Missing Coverage Covered Lines Changed/Added Lines %
PyNomaly/loop.py 32 44 72.73%
Files with Coverage Reduction New Missed Lines %
PyNomaly/loop.py 11 93.19%
Totals Coverage Status
Change from base Build 126: -6.2%
Covered Lines: 342
Relevant Lines: 367

💛 - Coveralls

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

On IBM Power8:

(venv-pynomaly) vconstan@SNA-MINSKY-N03:~/projects/PyNomaly$ python examples/numba_speed_diff.py
/home/vconstan/projects/PyNomaly/PyNomaly/loop.py:518: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function _compute_distance_and_neighbor_matrix failed at nopython mode lowering due to: scipy 0.16+ is required for linear algebra

File "PyNomaly/loop.py", line 537:
    def _compute_distance_and_neighbor_matrix(
        <source elided>
                diff = clust_points_vector[p[0]] - clust_points_vector[p[1]]
                d = np.dot(diff, diff) ** 0.5
                ^

During: lowering "$88call_method.23 = call $82load_method.20(diff, diff, func=$82load_method.20, args=[Var(diff, loop.py:536), Var(diff, loop.py:536)], kws=(), vararg=None)" at /home/vconstan/projects/PyNomaly/PyNomaly/loop.py (537)
  @staticmethod
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:177: NumbaWarning: Function "_compute_distance_and_neighbor_matrix" was compiled in object mode without forceobj=True.

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaWarning(warn_msg,
/home/vconstan/.conda/envs/venv-pynomaly/lib/python3.8/site-packages/numba/core/object_mode_passes.py:187: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "PyNomaly/loop.py", line 519:
    @staticmethod
    def _compute_distance_and_neighbor_matrix(
    ^

  warnings.warn(errors.NumbaDeprecationWarning(msg,

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

The above issue on IBM Power8 was related to an environmental error (scipy was not installed). Since scipy is needed for numba, this has now been reflected as an optional requirement in readme.md.

No Parallelization, only Numba JIT
Screen Shot 2020-09-17 at 9 14 13 AM

Numba JIT with Parallelization
Screen Shot 2020-09-17 at 9 14 28 AM

🚀 🚀 🚀

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

Given that there is a trade-off between the number of cores to utilize in parallel computation and communication between the parallel threads, it may be nice to allow users to set the number of concurrent threads to execute in parallel.

This seems to be set through a Numba environmental variable, and may be worth exploring adding as an additional, optional parameter when executing distance calculations in parallel: https://numba.pydata.org/numba-doc/latest/user/threading-layer.html#setting-the-number-of-threads

@vc1492a vc1492a mentioned this pull request Sep 17, 2020
@vc1492a
Copy link
Owner Author

vc1492a commented Sep 17, 2020

Added a num_threads parameter that can be used to specify the number of threads. So far, adding more threads - at least with how the parallelism is currently implemented - seems to slow down computation time when processing 25,000 values.

[ ================================================================================ ] 100.00%
Computation took 94.4145040512085 seconds with Numba JIT with parallel processing, using 1 thread.
[ ================================================================================ ] 100.00%
Computation took 114.98689579963684 seconds with Numba JIT with parallel processing, using 2 thread.
[ ================================================================================ ] 100.00%
Computation took 139.79329085350037 seconds with Numba JIT with parallel processing, using 3 thread.
[ ================================================================================ ] 100.00%
Computation took 168.51009488105774 seconds with Numba JIT with parallel processing, using 4 thread.

More investigation is needed to see if the above behavior is machine-specific or code related, but we now have the ability to parallelize distinct portions of the code and set the number of threads as well when using numba.

@vc1492a
Copy link
Owner Author

vc1492a commented Sep 18, 2020

Results from another machine:

[ ================================================================================ ] 100.00%
Computation took 34.91723585128784 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 32.24922227859497 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.427764892578125 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 30.22746515274048 seconds with Numba JIT with parallel processing, using 4 thread(s).

@vc1492a vc1492a added the help wanted Extra attention is needed label Sep 18, 2020
@vc1492a
Copy link
Owner Author

vc1492a commented Oct 1, 2020

[ ================================================================================ ] 100.00%
Computation took 50.41339111328125 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 64.93466305732727 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.55153703689575 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 60.493231773376465 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.03501510620117 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.178765058517456 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.13408589363098 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 65.27309513092041 seconds with Numba JIT with parallel processing, using 8 thread(s).
[ ================================================================================ ] 100.00%
Computation took 62.19127082824707 seconds with Numba JIT with parallel processing, using 9 thread(s).
[ ================================================================================ ] 100.00%
Computation took 59.75213074684143 seconds with Numba JIT with parallel processing, using 10 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.64805293083191 seconds with Numba JIT with parallel processing, using 11 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.80255579948425 seconds with Numba JIT with parallel processing, using 12 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.80128788948059 seconds with Numba JIT with parallel processing, using 13 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.00968599319458 seconds with Numba JIT with parallel processing, using 14 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.198336124420166 seconds with Numba JIT with parallel processing, using 15 thread(s).
[ ================================================================================ ] 100.00%
Computation took 57.532896995544434 seconds with Numba JIT with parallel processing, using 16 thread(s).

Results from another run.

@medvidov
Copy link

medvidov commented Oct 3, 2020

Results from another machine (4 core CPU, running from WSL):

[ ================================================================================ ] 100.00%
Computation took 51.52172231674194 seconds with Numba JIT with parallel processing, using 1 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.880839347839355 seconds with Numba JIT with parallel processing, using 2 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.5437228679657 seconds with Numba JIT with parallel processing, using 3 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.710304260253906 seconds with Numba JIT with parallel processing, using 4 thread(s).
[ ================================================================================ ] 100.00%
Computation took 56.60258507728577 seconds with Numba JIT with parallel processing, using 5 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.15400314331055 seconds with Numba JIT with parallel processing, using 6 thread(s).
[ ================================================================================ ] 100.00%
Computation took 55.54375123977661 seconds with Numba JIT with parallel processing, using 7 thread(s).
[ ================================================================================ ] 100.00%
Computation took 54.39351201057434 seconds with Numba JIT with parallel processing, using 8 thread(s).
'''

@vc1492a
Copy link
Owner Author

vc1492a commented Feb 3, 2021

Refactored how the processing is handled so that we see a speed improvement when using Numba and upping the number of cores. Once I handle the below issue, I'll report back with some numbers in regards to speed of computation.

To accomplish multi-core processing, this necessitated changes in the progress bar, which is still a work in progress. One of the key challenges currently is to flush the stdout in such a way that is compatible with Numba. While print statements are supported with Numba compiled functions, it doesn't seem that sys.stdout.flush() is supported.

@vc1492a vc1492a added on hold This issue to be resolved at a later time and removed in progress This issue is being actively worked on labels Apr 29, 2024
@vc1492a
Copy link
Owner Author

vc1492a commented Apr 29, 2024

Placing this issue on hold while other repository issues are resolved - this is low priority and can be resolved at a later time.

@vc1492a vc1492a added the low priority This issue is a lower priority relative to other open issues label Apr 29, 2024
vc1492a and others added 9 commits August 13, 2025 09:21
Updated `readme.md` to update the total number and monthly number of package downloads.
chore: remove Python 3.6 and 3.7 support
chore: update readme.md with another core library example
feat: refactor Validation class for ease of use
Rewrite the distance computation engine from scratch on top of v0.3.5:

- Vectorized kNN distances using NumPy broadcasting with chunked
  processing for memory efficiency and progress bar support
- Add n_jobs parameter for cross-cluster multiprocessing via
  concurrent.futures (n_jobs=-1 uses all cores)
- Restructure Numba path with non-generator kernels that support
  numba.prange for thread-level parallelism
- Optional scipy.spatial.distance.cdist and scipy.special.erf
  acceleration when scipy is available
- Vectorize _standard_distances, _prob_distances, and
  _norm_prob_outlier_factor pipeline methods
- Fully backward-compatible: all existing API calls work unchanged

Closes #36

Made-with: Cursor
@vc1492a vc1492a force-pushed the feature/numba_parallel branch from 5632d31 to 5e93be2 Compare March 20, 2026 18:12
@vc1492a vc1492a changed the title [WIP] - Feature/numba parallel feat: add parallel distance computation and vectorized pipeline Mar 20, 2026
@vc1492a vc1492a changed the base branch from dev to main March 20, 2026 18:12
@vc1492a vc1492a changed the base branch from main to dev March 20, 2026 18:13
Update version across loop.py, setup.py, and README badge.
Add changelog entry documenting all new features and improvements.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature of request help wanted Extra attention is needed low priority This issue is a lower priority relative to other open issues on hold This issue to be resolved at a later time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parallelize

3 participants